Introduction to GitHub

Marcin Stepniak

Programme of the workshop

  • Version Control Systems, Git and GitHub

  • Application of git in RStudio IDE;

  • Application of git in GitHub desktop.

What is Git?
What is GitHub?

Git

Git is a distributed version control system

  • each client (user) has its own local copy
    => local repository

  • which can be synchronized with a copy stored on external server
    => remote repository

Git workflow

Source: http://bit-booster.com/graph.html

GitHub

GitHub is a commercial Git repository hosting service
with some extra functionalities:

  • cloud-based service

  • web-based graphical interface

  • issue tracker

  • basic project management functionalities, access granting etc.

  • Extended free plan:

    • unlimited private repositories
    • unlimited collaborators

GitHub repo page

Why do you (might) need it?

Why do you (might) need it?

Why do you need it for yourself?

Source: https://smutch.github.io/VersionControlTutorial/

Why do you need it for yourself?

  • track changes: always have the most recent version of a file, while keeping the entire history of changes

  • experiment: you can experiment with your code
    and revert changes if needed

  • multiple instances: you can have several version of your code
    and easily switch between them

  • in case of using a remote repository (e.g. GitHub):

    • backup in a cloud (with all the above)
    • easy access from multiple devices
    • use issues to make notes to the future self

Why do you need it for sharing?

  • searchable repositories
  • created to share a code
  • easy to attach a web-based description of your code (README.md) file
    e.g. https://github.com/r-spatial/leafgl
  • easy to install directly from github
    (e.g. remotes::install_github("r-lib/remotes") )

Why do you need it for collaboration?

  • Git is distributed: all collaborators work on their own local copy
    (code, its versions and all the history of changes)
  • All collaborators can work simultaneously
  • Grant access even to private repositories
  • Share (and test!) a developer version of your code
  • pull request: revise changes before merging them
  • contributions are transparent
  • issues:
    • search for help, ask questions and discuss, request new features / modifications
    • easy to link to commits, cross-connect to other issues, bug tracker
    • assignees, mentions and notifications

Basic vocabulary: repository

Repo

Repository is like a main folder of your project.
It may contain any type of files, (sub)folders etc.

Golden rule: one project one repository.

  • Local repository is located on the user’s computer.
  • Remote repository is located on remote server.


README file: not required but highly recommended.
Describes the repo, may contain documentation, use-cases or
any other information you think may be useful for a potential user (including yourself!)

Create a new repo

Create a remote repository on GitHub


Go to your GitHub account and select New repository


Create a new repository on GitHub

Name your repository

  • Should be a character string (without space)

  • No official naming convention, but hyphens are mostly used to separate words

  • Great repository names are short and memorable.


Besides repository name you can:

  • add a brief description (read: you should)

  • create README.md file (yes, you want to do that)

Public or private?
Be brave and make a decision


  • public repository is, well, public (anyone can see it, clone it etc.)

  • private repository is visible only for you or anyone you grant an access to it. It is a good solution for:

    • testing, experiments, etc.
    • early-stage projects (for some reason you don’t want to share it, yet)
    • all in-progress stuff
    • any project should not be publicly visible

Public to private or private to public?
Not a problem! (usually)


  • you can change it anytime:

Add .gitignore

.gitignore contains a list of all the files that should not be included in the repo.

It’s highly recommended to inlcude .gitignore into your repository.

GitHub offers a list of .gitignore templates.
Select the one which fits you best.

Select a licence


New repo on github: summary

A brand new repository

Now it is your turn!

Create your first repository

  • create a new, public repository on GitHub
  • inlcude README and .gitignore (R)

Basic vocabulary: clone repo

Clone a repo


Clone creates a local copy of a remote repository

It contains all project’s files and full project’s history.

Git workflow: clone a repo

Clone a new repo

  1. You need to download (clone) repo from github

Clone a new repo (RStudio)

  1. Create new RStudio project…

  1. …using Version Control

Clone a new repo (RStudio)

  1. Paste link to the repo and select its location

Clone a new repo (RStudio)

Done!

Now it is your turn!

Clone your repo

Ignoring files (.gitignore)

What does (.gitignore) do?

Use .gitignore for all those files that should not be taken into account by Git so they are:

  • excluded from the version control and not tracked by Git

  • not inlcuded to any commit

  • not pushed to remote repository

  • not shared with colaborators / published

They only reside on your local machine.

What to add to .gitignore?

Anything we don’t want or don’t need to keep track of

  • user-specific files / settings (e.g. MyProjectName.Rproj)

  • notes & drafts

  • data, in particular large files

  • outputs of our code: tables, figures, html rendered from Rmarkdown files, etc.

  • Any personal / confidential data:

    • passwords
    • API keys

How to understand .gitignore?

  • Anything after # is ignored

  • *.Rmd ingores all .Rmd files (* replace any character(s)

  • !main.Rmd states that main.Rmd should be tracked (not ingored)

  • test.R ignores test.R file

  • /test.R ignores test.R file in a folder where .gitignore is located

  • test/test.R ignores test.R file in a test folder

  • test/ ignores all files in a test folder

Tip: use templates, add manually what you need.

Now it is your turn!

Modify your .gitignore file


  • add a comment my files and include:

    • all files of the .Rproj type
    • all files in the (future) temporary folder
  • save .gitignore

  • revise list of files in your git pane

  • copy lines added to .gitignore and send me through chat (directly to me!)

Basic vocabulary:
commit, push and pull

Commit

Commit is a snapshot of the repository.

Commits are cheap.
Commit often.

Commits

Each commit has its own identifier.

You can:
+ have an access to the repository at a given stage
+ revise changes made by a particular commit

Revision of changes

Revision of chages

You can even compare binary files!
Use Display the rich diff

Commit messages

Each commit must contain a message.
Add useful, self-explanatory messages.

Source: https://imgs.xkcd.com/comics/git_commit.png

Push and pull changes

It’s all about syncing your local and remote repositories.


Push chagnes: sends recent commit(s) from your local repository to the remote one.

Pull repository: downloads recent commits from remote repository and update your local repository


Tip 1: Pull before push (sooner or later, this habit helps to keep your mental health).

Tip 2: Push often. Commit even more often.
Every time you push, you are making a cloud backup of your project.
Extra point: you get more pit stops you can refer to.

Git workflow: commits

Git pane in RStudio

  1. Pull - updates local repo using remote one
  2. Push - sends changes from your local repo to the remote one
  3. Options
    • Revert changes - discards changes made since the last commit
    • Ignore - adds a file (or files) to .gitignore
    • Open Shell
  4. Commit selected files
  5. Revise changes offered by the laste commit
  6. Revise history of a project (all commits)

File states

  1. Committed: form part of the repo in the current form.
    Git keeps their most recent state and full history of changes.

  2. Untracked: git sees them but has no clue what’s your point, yet.

  3. Staged: their current form is frozen and can be added to commit.
    In case of further changes, you need unstage and stage it again.

  4. Ignored: git doesn’t see nor it’s interested whether they are changed or not.
    You need to explicitly inform git to ignore a file.

File states in RStudio

  • Added: new file added to repository.

  • Untracked file.

  • Deleted file: it is not available in repository anymore
    important: the file can be found when browsing commits / history of repository.

  • Modified: file has been changed since the last commit

Stage file in RStudy Git pane

Select which changes (modify / add / delete) should be added to the commit (i.e. which changes will be saved together with the commit)

Unselected changes will be available only in the particular evnironment, but they cannot be pushed to the remote repository.
The file(s) or their recent changes are not untracked.

Commit in RStudio Git pane

Once you want to commit changes, you are taken to the new window.
You can revise changes file by file and decide which of them you want to add to the commit.

Add a commit message


Finally, you have to add a commit message. Otherwise you get the following error:

Aborting commit due to empty commit message.

Push commit to remote repository

Once you finalize your commit, you can push your commit(s) to the remote repository.

After a while, commit becomes alive in your github repository
(you need to refresh your browser).

Changes in staged file

If you stage a file and then you make some extra changes, they will not be added to the commit.

In order to include them in the current commit, you need to
unstage the file and stage it again.

Workflow: remote & local


Workflow: remote & two different locals

Now it is your turn!

Make your first commit


  • Copy example_code.R file to your project folder.
  • Create temporary folder
  • Copy exercice_description.R to the temporary folder
  • Revise your Git pane in RStudio

Now it is your turn!
Make your first commit

  • Stage file example_code.R file
  • Select Commit button

Now it is your turn!
Make your first commit

  • Add commit message
  • Commit and push changes to the remote repository
  • inspect repository on your github account

Now it is your turn!
Commit modified file

  • Open example_code.R file in RStudio
  • Open exercice_description.R file in RStudio
  • Copy lines 25-26 from exercice_description.R.
  • Replace line 17 in example_code.R by copied lines
my_first_plot <- ggplot(data = mpg) +
    geom_point(mapping = aes(x = displ, y = hwy))

Now it is your turn!
Commit modified file

  • Stage modified file
  • Revise changes

  • Commit & Push

Basic vocabulary: conflicts

Conflicts

Source: https://imgs.xkcd.com/comics/git.png

Resolving conflicts

Conflict appears when different changes have been made to the same file since the last commit.

Conflicts have to be resolved.

Now it is your turn!
Create and resolve a conflict

Now it is your turn!
Create and resolve a conflict

Open README file directly from your github remote repository:

Edit the file, and commit changes:

Now it is your turn!
Create and resolve a conflict

  • WITHOUT pulling changes, open README.md file in RStudio
    and modify the file (as you wish, but differently than you have done via web).

  • Save the file, stage it and add to the commit.

  • Commit changes and Push them to the remote repository

Now it is your turn!
Create and resolve a conflict

Congratulations! You have provoked your first git conflict.

Bad news: you need to somehow resolve it.

Now it is your turn!
Create and resolve a conflict

  • Pull remote file to your local repository.

  • Open the file and make final changes to the problematic lines
    (i.e. those between <<<<<<< and >>>>>>> followed by the commit id)
  • remove conflict markers (<<<<<<<, =======, >>>>>>>)
  • Commit and push your changes.

Resolved!

Basic vocabulary: branches

Branch

Branch is a parallel version of a repository.

The main branch of the repository is a master branch.

By creating a new branch you duplicate a current state of the repository (all files, commits etc.).

Source: https://docs.github.com/assets/images/help/branches/pr-retargeting-diagram1.png

Why to use branches?

  • Branch permits you to work with your project without affecting the main stream.
  • You can experiment with your code without any harm on previous version. You can easily discard changes if they don’t work for you.
  • You can take several different approaches and experiment which one of them works best.
  • You can easily switch between branches, compare them against each other, revise changes, etc.
  • Many contributors may work with the code in a paralell.
  • Each branch can have multiple commits.
  • You can have multiple branches and you can create a branch from any of the previous branches.

Create a new branch

New branch via GitHub.com

You can create a new branch via GitHub.com web:

Name convention for branches is similar to the repo one (e.g. this-is-my-new-branch)

New branch from GitHub.com to RStudio

Once you’ve created a new branch, it appears in your local repo,
just pull the remote one.

Now, you can check out to this new branch.
Any further commit will be added to this new branch, while master will be untouched.

New branch from RStudio

You can create a new branch directly from RStudio:

Select a name for the new branch.
Tip: Add remote, preferably using the same name as in your local repository. The new branch appears also in your remote repository:

Now it is your turn!
Create a new branch and add a new commit

Now it is your turn!
Create a new branch and add a new commit

  • Create a new branch
  • Add some text to README.md (whatever you wish)
  • Replace lines 17-18 in example_code.R with lines 31-33 from exercice_description.R
  • Stage both files and commit

  • Push changes to the remote repository
  • Inspect your remote repository on GitHub.com

Basic vocabulary:
Pull request & merge

Merge

Once you are ready to incorporate the changes, you can merge them into your master branch.

It means, that all commits from a given branch will be inlcuded into your master branch.

Compare branches

On GitHub.com you can compare two branches.
To do so you can either:

  • add /compare to your repository path
    (e.g. https://github.com/stmarcin/github-workshop/compare)

or

  • Click on Compare & pull request and scroll down.

Compare branches via GitHub.com

Pull request

  • Pull request opens a discussion on proposed changes.

  • By filling a pull request you are asking a repo owner to pull your changes to the repository.
    Note: you can ask yourself (as repo owner) as well!

  • Submitted changes are to be reviewed and either accepted or rejected by a repository’s owner.

Add a comment to the pull request

Submit the pull request

Delete unnecessary branch

Once you confirm merge, you can delete an old branch.

Go back to RStudio. Check out to master and click pull

>>> C:/Program Files/Git/bin/git.exe pull
From https://github.com/stmarcin/test_workshop
   2387ab3..4e6e32c  master                  -> origin/master
Updating 2387ab3..4e6e32c
Fast-forward
 README.md      | 1 +
 example_code.R | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

Delete branch in RStudio

The branch is deleted from the remote repository, but it is still visible in RStudio pop-up menu:

Delete branch in RStudio

In order to remove it you need to:

  • go to (S)hell

  • type the following:
git fetch -p
  • and delete a branch
git branch -d name-of-your-branch

Now it is your turn!
Merge a branch

  • Create a pull request
  • Merge a pull request
  • Comfirm merge
  • Delete branch on GitHub.com
  • Check out to master
  • Optional: delete your local branch (and remove it from RStudio pop-up menu)

Annex I: Fork

Basic vocabulary:
Fork a repository

Fork

Fork a repository makes a personal copy of another user’s repository.

All commits you made are pushed to your copy of the original repository without afecting an original one.

You can establish a link to an original repo, keep your fork synchronized and make a pull request to an upstream repo.

Workflow with forked repository

  1. Once you fork a repository you need to clone it to the local repo.

  2. Once it is cloned you have to set a link to the original one.

  3. Open Shell and write the following:

    git remote add upstream https://github.com/stmarcin/repo-for-workshop-tgis
  4. Create a new branch and check-out to it.

  5. Add/edit files, stage them and commit changes.

  6. Push changes to your remote repository.

  7. Create a Pull request

Basic vocabulary: issues

GitHub issues

GitHub issues serve to communicate with users (including yourself).

  • Each issue has its own id which can be then refered to and contains its own discussion thread.
  • You can refer to the issue id (#) in your commit messages, e.g. Solve #3
  • You can @mention another user so they are informed about the issue.

Now it is your turn!
Fork a repository

  • Fork the repository https://github.com/stmarcin/repo-for-workshop-tgis and clone it to your local settings.
  • Add upstream
  • Open new issue and @mention me!
  • Create a new branch (new-branch-NumberOfYourFile)
  • Edit your file, stage it and add to commit.
  • Push changes and open a pull request
  • In your pull request refer to your issue

Annex II: GitHub desktop

GitHub destkop

Set up GitHub Desktop

Create a new repository

Go to File -> New repository (or Ctrl + N)

Create a remote repository

For the sake of your mental health:
use the same name for both, local and remote repository!

Add a file to your local repository

You can just simply copy a file to your repo’s folder:

GitHub desktop offers default commit messages
(such sophisticated as: Create file.txt or Update file.txt)

Commit and push changes

Once you commit you’ll see, that your changes are not pushed to your remote repo (1).

You need to push commits manually (2)

Create a new branch

In order to create a new branch go to:
Branch -> New Branch (or Ctrl + Shift + N):

Once you have created a new branch, confirm which branch you are currently on:

Update a branch

I have accidentally commit to master branch instead of the my-first-new-branch.

GitHub Desktop allows you to Update your new branch from master.

Create a new issue from GitHub Desktop

Go to Repository -> Create issue on GitHub (or Ctrl + i)

You can directly refer to the issue writing a commit message:

Ignore file(s), dicard (recent) changes

Select a file in Changes pane so you can:

  • add a file to .gitignore
  • edit a file (open in… e.g. Notepad++)
  • discard recent changes:

Merge branch from GitHub Desktop

You need to check out to the branch you want to merge (e.g. master)

When it’s done, you can delete a branch:

Now it is your turn!
Work with GitHub Desktop

  • Create a new public repository using GitHub desktop
  • Publish it
  • Create an issue
  • Create my-new-branch and check out to it
  • Modify a file and commit changes refering to an issue
  • Merge my-new-branch to master and delete it.

Annex III: new repo

Create a new repo

Create a repo for an existing project

Do you already have R project and need to create GitHub repo for it?
No problem!


The easiest way to do it (no configuration needed):

  1. Create a new repo on GitHub (remember to include .gitignore);
  2. Clone the repo to you desktop
  3. Copy your project to the one you created while cloning your repo.
  4. Commit the files of your choice (adding unnecessary to .gitignore)


Done!

Using {usethis} package

  1. install {usethis} package
install.packages("usethis")
  1. Create s local repository
usethis::use_git()
  1. Commit. Add unnecessary files to .gitignore.

  2. Create a new repo on GitHub as a remote to your local one

usethis::use_github()


Note 1: use use_github() help to check its parameters ( ?usethis::use_github() ).

Note 2: in case of problems, check relevant chapter of Happy Git and GitHub for the useR

Annex IV: .gitignore

Types of .gitignore


  • repository .gitignore

    • Located in the root folder of the repository
    • Created together with the repository by GitHub


  • local .gitignore

    • Located in a given folder
    • refers to files of a given folder


  • global .gitignore for every Git repository on your computer

How to create .gitignore?

On GitHub while creating repository


On GitHub whenever


In RStudio from Git pane


You can select which files you want to add to .gitignore



In the next step you can select where .gitignore will be located
and add whichever files you want, even those which do not yet exist

In R using {usethis} package


  • use_git_ignore() tells git to ignore particular file(s)
usethis::use_git_ignore(ignores, directory = "."
  • edit_git_ignore() opens .gitignore file so you can edit it manually.
usethis::edit_git_ignore()

Thank you!